Who did What: A Large-Scale Person-Centered Cloze Dataset
نویسندگان
چکیده
We have constructed a new “Who-did-What” dataset of over 200,000 fill-in-the-gap (cloze) multiple choice reading comprehension problems constructed from the LDC English Gigaword newswire corpus. The WDW dataset has a variety of novel features. First, in contrast with the CNN and Daily Mail datasets (Hermann et al., 2015) we avoid using article summaries for question formation. Instead, each problem is formed from two independent articles — an article given as the passage to be read and a separate article on the same events used to form the question. Second, we avoid anonymization — each choice is a person named entity. Third, the problems have been filtered to remove a fraction that are easily solved by simple baselines, while remaining 84% solvable by humans. We report performance benchmarks of standard systems and propose the WDW dataset as a challenge task for the community.1
منابع مشابه
Large-scale Cloze Test Dataset Designed by Teachers
Cloze test is widely adopted in language exams to evaluate students’ language proficiency. In this paper, we propose the first large-scale human-designed cloze test dataset CLOTH 1, in which the questions were used in middle-school and high-school language exams. With the missing blanks carefully created by teachers and candidate choices purposely designed to be confusing, CLOTH requires a deep...
متن کاملDataset for the First Evaluation on Chinese Machine Reading Comprehension
Machine Reading Comprehension (MRC) has become enormously popular recently and has attracted a lot of attentions. However, existing reading comprehension datasets are mostly in English. To add diversity in reading comprehension datasets, in this paper we propose a new Chinese reading comprehension dataset for accelerating related research in the community. The proposed dataset contains two diff...
متن کاملThe Effect of the Modified Cloze Procedure on the Writing Proficien-cy of Iranian Intermediate EFL Learners
The present study was conducted to investigate the effect of modified cloze procedure on writing proficiency of Iranian intermediate EFL learners. To fulfill the purpose of the study, 110 participants studying at Semnan University majoring in English literature were tested on CELT proficiency test. 65 participants who were found to be homogenous were selected and assigned randomly to two groups...
متن کاملA Pilot Study of Biomedical Text Comprehension using an Attention-Based Deep Neural Reader: Design and Experimental Analysis
BACKGROUND With the development of artificial intelligence (AI) technology centered on deep-learning, the computer has evolved to a point where it can read a given text and answer a question based on the context of the text. Such a specific task is known as the task of machine comprehension. Existing machine comprehension tasks mostly use datasets of general texts, such as news articles or elem...
متن کاملEmergent Predication Structure in Vector Representations of Neural Readers
Reading comprehension is a question answering task where the answer is to be found in a given passage about entities and events not mentioned in general knowledge sources. A significant number of neural architectures for this task (neural readers) have recently been developed and evaluated on large cloze-style datasets. We present experiments supporting the emergence of “predication structure” ...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2016